Optimizing information flow in small genetic networks. III. A self— interacting gene 
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Living cells must control the reading out or "expression" of information encoded in their genomes, 
and this regulation often is mediated by transcription factors — proteins that bind to DNA and 
either enhance or repress the expression of nearby genes. But the expression of transcription factor 
proteins is itself regulated, and many transcription factors regulate their own expression in addition 
to responding to other input signals. Here we analyze the simplest of such self-regulatory circuits, 
asking how parameters can be chosen to optimize information transmission from inputs to outputs 
in the steady state. Some nonzero level of self-regulation is almost always optimal, with self- 
activation dominant when transcription factor concentrations are low and self-repression dominant 
when concentrations are high. In steady-state the optimal self-activation is never strong enough to 
induce bistability, although there is a limit in which the optimal parameters are very close to the 
critical point. 



I. INTRODUCTION 

In order to function and survive in the world, cells must 
make decisions about the reading out or "expression" of 
genetic information. This happens when a bacterium 
makes more or less of an enzyme to exploit the varia- 
tions in the availability of a particular type of sugar, and 
when individual cells in a multicellular organism commit 
to particular fates during the course of embryonic devel- 
opment. In all such cases, the control of gene expression 
involves the transmission of information from some in- 
put signal to the output levels of the proteins encoded 
by the regulated genes. Although the notion of informa- 
tion transmission in these systems usually is left infor- 
mal, the regulatory power that the system can achieve — 
the number of reliably distinguishable output states that 
can be accessed by varying the inputs — is measured, log- 
arithmically, by the actual information transmitted, in 
bits [U H]. Since relevant molecules often are present 
at relatively low concentrations, or even small absolute 
numbers, there are irreducible physical sources of noise 
that will limit the capacity for information transmission. 
Cells thus face a tradeoff between regulatory power (in 
bits) and resources (in molecule numbers). What can 
cells do to maximize their regulatory power at fixed ex- 
penditure of resources? More precisely, what can they 
do to maximize information transmission with bounded 
concentrations of the relevant molecules? 

We focus on the case of transcriptional regulation, 
where proteins — called transcription factors (TFs) — bind 
to sites along the DNA and modulate the rate at which 
nearby genes are transcribed into messenger RNA. Be- 
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cause many of the regulated genes themselves code for 
TF proteins, regulatory interactions form a network. The 
general problem of optimizing information flow through 
such regulatory networks is quite hard, and we have tried 
to break this problem into manageable pieces. Given 
the signal and noise characteristics of the regulatory in- 
teractions, cells can try to match the distribution of in- 
put transcription factor concentrations to these features 
of the regulatory network; even simple versions of this 
matching problem make experimentally testable predic- 
tions [2111]. Assuming that this matching occurs, some 
regulatory networks still have more capacity to trans- 
mit information, and we can search for these optimal 
networks by varying both the topology of the network 
connections and the strengths of the interactions along 
each link in the network (the "numbers on the arrows" 
[5]). We have addressed this problem first in simple net- 
works where a single input transcription factor regulates 
multiple non-interacting genes [B], and then in interact- 
ing networks where the interactions have a feedforward 
structure [TJ. But real genetic regulatory networks have 
loops, and our goal here is to study the simplest such 
case, where a single input transcription factor controls a 
single self-interacting gene. Does feedback increase the 
capacity of this system to transmit information? Are 
self-activating or self-repressing genes more informative? 
Since networks with feedback can exhibit multistability 
or oscillation, and hence a nontrivial phase diagram as 
a function of the underlying parameters, where in this 
phase diagram do we find the optimal networks? 

Auto-regulation, both positive and negative, is one of 
the simplest and most commonly observed motifs in ge- 
netic regulatory networks [8l410j . and has been the focus 
of a number of experiments and modeling studies (see, 
for example, Refs [HJH1]). A number of proposals have 
been advanced to explain its ubiquitous presence. Nega- 
tive feedback (self-repression) can speed up the response 
of the genetic regulatory element [13], and can reduce 



the steady state fluctuations in the output gene expres- 
sion levels [14] . Positive feedback (self-activation) , on the 
other hand, slows down the dynamics of gene expression 
and sharpens the response of a regulated gene to its ex- 
ternal input. Self-activating genes could thus threshold 
graded inputs, transforming them into discrete, almost 
"digital" outputs [T5], allowing the cell to implement 
binary logical functions [16) . If self-activation is very 
strong, it can lead to multistability, or switch-like behav- 
ior of the response, so that the genetic regulatory element 
can store the information for long periods of time [T?HF3] ; 
such elements will also exhibit hysteretic effects. Weak 
self-activation, which does not cause multistability, has 
been studied less extensively, but could play a role in al- 
lowing the cell to implement a richer set of input/output 
relations [10]. Alternatively, if the self-activating gene 
product can diffuse into neighboring nuclei of a multi- 
cellular organism, the sharpening effect of self-activation 
can compensate for the "blurring" of responses due to 
diffusion and hence open more possibilities for noise re- 
duction through spatial averaging [20j EI] • 

Many of the ideas about the functional role of auto- 
regulation are driven by considerations of noise reduc- 
tion. The physical processes by which the regulatory 
molecules find and bind to their regulatory sites on the 
DNA, the operation of the transcriptional machinery, it- 
self subject to thermal fluctuations, and the unavoidable 
shot noise inherent in producing a small number of out- 
put proteins all contribute towards the stochastic nature 
of gene expression and thus place physical limits on the 
reliability of biological computation [22-25] . In the past 
decade the advance of experimental techniques has en- 
abled us to measure the total noise in gene expression 
and sometimes parse apart the contributions of various 
molecular processes towards this "grand total" [261 - 134] . 
With the detailed knowledge about noise in gene expres- 
sion we can revisit the original question and ask: can 
both forms of auto-regulation help mitigate the deleteri- 
ous effects of noise on information flow through the reg- 
ulatory networks and if so, how? 

All of our previous work in information transmission 
in transcriptional regulation has been in the steady state 
limit. A similar approach was taken in Ref 53J, where the 
authors analyze information flow in elementary circuits 
including feedback, but with different model assumptions 
about network topology and noise. More recently, de 
Ronde and colleagues have systematically reexamined the 
role of feedback regulation on the fidelity of signal trans- 
mission for time varying, Gaussian signals in cases where 
the (nonlinear) behavior of the genetic regulatory ele- 
ment can be linearized around some operating point [36] . 
They found that auto-activation increases gain-to-noise 
ratios for low frequency signals, whereas auto-repression 
yields an improvement for high frequency signals. While 
many of the functions of feedback involve dynamics, as 
far as we know all analyses of information transmission 
with dynamical signals resort to linear approximations. 
Here we return to the steady state limit, where we can 
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FIG. 1: A schematic diagram of a self-regulating gene. The 
gene F is depicted by a thick black line and a promoter start 
signal. Gene products g denoted as blue circles can bind to the 
regulatory sites (one in this example) that control the expres- 
sion of r. Direct control over the expression of F is exerted 
by molecules of the transcription factor c (green diamonds, 
two binding sites). 



treat gene regulatory elements as fully nonlinear devices. 
While we hope that our analysis of the self regulated 
gene is interesting in itself, we emphasize that our goal 
is to build intuition for the analysis of more general net- 
works with feedback. 



II. FORMULATING THE PROBLEM 

Figure [l] shows a schematic of the system that we will 
analyze in this paper, a gene T that is controlled by two 
regulators: directly by an external transcription factor, 
as well as in a feedback fashion by its own gene prod- 
ucts. We will refer to the transcription factor as the reg- 
ulatory input; its concentration in the relevant (cellular 
or nuclear) volume will be denoted by c. In addition, 
the gene products of T, whose number in the relevant 
volume f2 we denote by G and to which we refer to as 
the output, can also bind to the regulatory region of T, 
thereby activating or repressing the gene's expression. As 
we attempt to make our description of this system math- 
ematically precise, the heart of our model will be the 
regulatory function that maps the concentrations of the 
two regulators at the promotor region of T to the rate at 
which output molecules are synthesized. 

We can write the equation for the dynamics of gene 
expression from T by assuming that synthesis and degra- 
dation of the gene products are single kinetic steps, in 
which case we have 



dG 
~dt 



r m ax/(c,7) - -G + £ 



(1) 



here, r max is the maximum rate for production of G, r is 
the protein degradation time, and 7 is the concentration 
of the output molecules in the relevant volume fi. To in- 
clude the noise effects inherent in creating and degrading 
single molecules of G we introduce the Langevin force £, 
and we will discuss the nature of this and other noise 
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sources in detail later. Importantly, departures from our 
simplifying assumptions about the kinetics can, in part, 
be captured by proper treatment of the noise terms, as 
discussed below. 

We are interested in the information that the steady 
state output of T provides about the input concentration 
c. Following our previous work [51 [7], we address this 
problem in stages. First we relate information transmis- 
sion to the response properties and noise in the regula- 
tory element, using a small noise approximation to allow 
analytic progress (Section II A I. Then we show how the 
relevant noise variances can be computed from the model 
in Eq , taking advantage of our understanding of the 
phys ics underlying the essential sources of noise (Section 
II B| ; this discussion is still quite general, independent 
of the details of the regulation function. Then we ex- 
plain our choice of the regulation function, adapted from 
the Monod-Wyman-Changeux description of allosteric 
interactions (Section II C I. Because feedback allows for 
bifurcations, we have to map the phase diagram of our 
model (Section II D ), and develop approximations for the 
information transmission near the critical point (Section 
II E[ ) and in the bistable regime (Section II F). Our dis- 
cussion reviews some earlier results, in the interest of 



being self-contained, but the issues in Sections II D II F 
are all new to the case of networks with feedback. 



A. Noise and information transmission 

We are interested in computing the mutual informa- 
tion between the input and the output of a regulatory 
element, in steady state. We have agreed that the input 
signal is the concentration c of the transcription factor, 
and we will take the output to be the concentration g 
of the gene products, which we colloquially call the ex- 
pression level of the gene V. An important feature of the 
information transmission is that its mathematical defini- 
tion is independent of the units that we use in measuring 
these concentrations, so when we later choose some natu- 
ral set of units we won't have to worry about substituting 
into the formulae we derive here. 

Following Shannon [T], the mutual information be- 
tween c and g is defined by 



I(c;g) = J dc J dgP{c,g)\og 2 



P(c,g) 



Pin(c)Pout{g) 



bits, 



. (2) 

where input concentrations c are drawn from the distri- 
bution Pi n (c), the output expression levels that we can 
observe are drawn from the distribution P out (g), and the 
joint distribution of these two quantities is P(c,g). We 
think of the expression level as responding to the inputs, 
but this response will be noisy, so given the input c there 
is a conditional distribution P(g\c). Then the symmetric 



expression for the mutual information in Eq ^ can be 
rewritten as a difference of entropies, 

I(c;g) = S[P out (g)} - [ dc P in (c)S[P(g\c)}, (3) 



where the entropy of a distribution is defined, as usual, 
by 



(4) 



S[P(x)] = - J dxP{x)\og 2 P{x). 
Finally, we recall that 

Pout(<7) = / dcP in {c)P(g\c). (5) 



Notice that the mutual information is a functional of 
two probability distributions, Pj n (c) and P(g\c). The lat- 
ter distribution describes the response and noise char- 
acteristics of the regulatory element, and is something 
we will be able to calculate from Eq 0. Following 
Refs [21 HI |6j [7] , we may then ask: given that P{g\c) is 
determined by the biophysical properties of the genetic 
regulatory element, what is the optimal choice of Pi n (c) 
that will maximize the mutual information I(c;g)l To 
this end we have to solve the problem of extremizing 



£[P in (c)] =I(c;g)-A / dc P in (c) 



(6) 



where the Lagrange multiplier A enforces the normaliza- 
tion of -Pm(c). Other "cost" terms are possible, such as 
adding a term proportional to J dccP; n (c), which would 
penalize the average cost of input molecules c, although 
here we take the simpler approach of fixing the maximum 
possible value of c, which is almost equivalent [6 . If the 
noise were truly zero, we could write the distribution of 
outputs as 



Pout(ff) = J dcP in (c)5[g 



9(c)], 



(7) 



where g(c) is the average output as a function of the 
input, i.e. the mean of the distribution P(g\c). Then if 
the function g(c) is invertible, we can write the entropy 
of the output distribution as 



S[P ut(g)] = - J dgP ont {g)\og 2 P out (g) 



dcP in (c) log 



Pn(c) 



dg(c) 



dc 



and we can think of this as the first term in an expansion 
in powers of the noise level [3] . Keeping only this leading 
term, we have 
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C[P in (c)} = - / dcP in (c) log 2 



Pi„(c) 



dg(c) 



dc 



dcP in (c)S[P(g\c)]-A J dcP m {c) 



(9) 



r 



and one can then show that the extremum of C occurs at 



P* ( c ) = ±2~ s ^\-)] 



Z 



dg(c) 



dc 



(10) 



where the entropy is measured in bits, as above, and the 
normalization constant 



Z = dc 



dg(c) 



dc 



2 -S[P( ff |c)]_ 



(11) 



The maximal value of the mutual information is then 
simply I* = log 2 Z. 

In the case where P(g\c) is Gaussian, 



P(g\c) 



i 



exp 



9(c))' 



(12) 



the entropy is determined only by the variance cri{c), 



1 



S[P(g\c)} = -log 2 [27re^(c)] 



(13) 



It is useful to think about propagating this (output) noise 
variance back through the input/output relation g(c), to 
define the effective noise at the input, 



^c(c) 



dg(c 



dc 



Then we can write 

K(c) 



i l 

Z v / 2~7re<7 c (c) 
As before, Z is the normalization constant, 

"I 1/2 



Z= / dc 
Jo 



1 /rfgV 1 
27re ^cfc/ 0-2 (c) 



(14) 



(15) 



(16) 



where C is the maximal value of the input concentration, 
and again we have the information I*(c;g) = log 2 Zbits. 



B. Noise variances 



Equation ( 16 ) relates Z, and hence the information 



transmission /* = log 2 Z, to the steady state response 
and noise in our simple regulatory element. These quan- 
tities are calculable from the dynamical model in Eq ([I]), 
if we understand the sources of noise. There are two 
very different kinds of noise that we need to include in 
our analysis. 



First, we are describing molecular events that syn- 
thesize and degrade individual molecules, and individ- 
ual molecules behave randomly. If we say that there is 
synthesis of f molecules per second on average, then if 
the synthesis is limited by a single kinetic step, and if all 
molecules behave independently, then the actual rate r{t) 
will fluctuate with a correlation function (5r(t)6r(t')) = 
f5(t — t'). Similarly, if on average there is degradation of 
d molecules per second, then the actual degradation rate 
d(t) will fluctuate with (Sd(t)5d(t')) = dS(t-t'). Thus if 
we want to describe the time dependence of the number 
of molecules N(t), we can write 



dN 

~dt 



where 



= r(t) - d(t) = r-d + £(t), 



£(t) = 5r(t)-5d(t). 



(17) 



(18) 



If we are close to the steady state, r = d, and if synthesis 
and degradation reactions are independent, we have 



(WW)) =2d8{t-t'). 



(19) 



If some of the reactions involve multiple kinetic steps, or 
if the molecules we are counting are amplified copies of 
some other molecules, then the noise will be proportion- 
ally larger or smaller, and we can take account of this by 
introducing a "Fano factor" v, so that 



(0) -> 2vd ^ 1 - 



(20) 



For more about the Langevin description of noise in 
chemical kinetics, see Ref |37j . 

The second irreducible source of noise is that the 
synthesis reactions are regulated by transcription factor 
binding to DNA, and these molecules arrive randomly at 
their targets. One way to think about this is that the 
concentrations of TFs which govern the synthesis rate 
are not the bulk average concentrations over the whole 
cell or nucleus, but rather concentrations in some small 
"sensitive volume" determined by the linear size I of the 
targets themselves [22J, [33 |39] . Concretely, if we write 
the synthesis rate as 



</(c,7) = 



(21) 



where c is the local concentration of the input transcrip- 
tion factor and 7 is the concentration of the gene product 
that feeds back to regulate itself, we should really think 
of these concentrations as c = c + £ c and 7 = G/Q + £ g , 
where we separate the mean values and the local fluctu- 
ations; note that the mean gene product concentration 
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is the ratio of the molecule number G to the relevant 
volume fl. The local concentration fluctuations are also 
white, and the spectral densities are given accurately by 
dimensional analysis |22, 38 , so that 

(&(t)&(0) = (2c/D£)S(t-t') (22) 
<£y(*)£y(*')) = (2G/nDe)6(t-t), (23) 

where D is the diffusion constant of the transcription 



factor molecules, which we assume is the same for the 
input and output proteins. 

We can put all of these factors together if the noise 
is small, so that it drives fluctuations which stay in the 
linear regime of the dynamics. Then if the steady state 
solution to Eq ([!]) in the absence of noise is denoted by 
G — G(c), we can linearize in the fluctuations SG — 
G-G: 



dG 
lit 



d(SG) 



J(c, 7 )--G + £ 
r 

J(c + £ c ,G/ft + £ 7 )--G + £ 



dt 

(£eff(i)£eff(t')) 



n 



G 

v — 

T 



d~< 



7=G/n 
9/(c,7) 



SG + £ c ff (£), where 



7=G/n, 



2G 
VLDi 



df(c,G/il) 
dc 



2c 
Dt 



5(t-t'). 



(24) 
(25) 

(26) 



To solve this problem and compute the variance in the output number of molecules ((SG) 2 ), it is useful to recall 
the Langevin equation for the position x(t) of an overdamped mass tethered by a spring of stiffness n, subject to a 
drag force proportional to the velocity, -Fdrag 



-ctv: 
dx 



o 



dt 



-kx + C(t) 



(C(t)C(t')) = 2k B Ta5(t-t'). 

From equipartition we know that these dynamics predict the variance (x 2 ) 
Langevin description of the synthesis and degradation reactions, we find 



(27) 
(28) 

hsT/n. Identifying terms with our 



((SG) 2 ) = — 



1 

n dt L 



G 



df 



G 

abl 



df 
'dc 



c 

m 



(29) 



where we understand that the partial derivatives of / are to be evaluated at the steady state 7 = G/Cl. 



We have defined 



as the maximum synthesis rate, so that the regulation function / is in the range < / < 1, 



and hence the maximum mean expression level is G max = r max r. Thus it makes sense to work with a normalized 
expression level g = G/(r max r), and to think of the regulation function / as depending on g rather than on the 
absolute concentration 7. Then we have 



((5g) 2 ) = a 2 g (c) 



1 



1 ~ (df/dg) 



df 
dg 



1 



D£ ln 



01 

dc 



c 

d! 



(30) 



where 7 max is the maximal mean concentration of output molecules. As discussed previously [6], we can think of 
N g = r max r/i> as the maximum number of independent output molecules, and this combines with the other parameters 
in the problem to define a natural concentration scale, cq = N g /(D£r). Once we choose units where cq = I, we have 
a simpler expression, 



1 



1 



N g 1 -(df/dg) 



d£ 
dg 



df 
dc 



(31) 



where we notice that almost all the parameters have been eliminated by our choice of units. 



G 



Finally, we need to use the variance to compute the information capacity of the system, I* = log 2 Z, where from 
Eq ( 16 1 we have 



Jo 



dc 



dc 



1 (dg 



1 



2ire \dc J a 2 (c) 
dg 



- 1/2 






\ Ng ] 




2ire_ 



-1 1/2 



1 - (df/dg) 



dcj g+ (df/dg) 2 (5/7max) + [df/dcfc 
I 



1/2 



(32) 
(33) 



C is the maximum concentration of input transcrip- 
tion factor molecules, in units of cq. Notice that the 
parameter N g just scales the noise and (in the small 
noise approximation) thus adds to the information, /* — 
\og 2 Z + (1/2) log 2 (A r g /27re); the problem of optimizing 
information transmission thus is the problem of optimiz- 
ing Z. Further, because 

9 = f(c,9), (34) 
the total derivative dg/dc can be expressed though 



dg = df_ df_dg_ 
dc dc dg dc 

Putting all of these pieces together, we find 



(35) 



dc 



'21 

dc 



(36) 



In what follows we will start with the assumption that, 
since both input and output molecules are transcrip- 
tion factor proteins, their maximal concentrations are 
the same, and hence 7 max = C; we will return to this 
assumption at the end of our discussion. 

If a regulatory function /(c, g) is chosen from some 
parametric family, Eq ( 36 1 allows us to compute the in- 



formation transmission as a function of these parame- 
ters and search for an optimum. Before embarking on 
this path, however, we note that the integrand of Z can 
have a divergence if df/dg = 1. This is a condition for 
the existence of a critical point, and in this simple sys- 
tem the critical point or bifurcation separates the regime 
of monostability from the regime of bistability. We ex- 
pect that at this point the fluctuations around g are no 



longer Gaussian, and we need to compute higher order 
moments. Thus, Eq (36), as is, can safely be used only 



in the monostable regime away from the critical point; 
in Section [II E| we compute the expression for the mutual 
information near to and at the critical point for a par- 
ticular choice of /. There are even more problems in the 
bistable regime, since there are multiple solutions to Eq 
(34), and in Section II F we discuss information in the 



bistable regime. 

C. MWC regulatory function 



To continue, we must choose a regulatory function. In 
Ref [7], where we analyzed genetic networks with feed- 
forward interactions, we studied Hill-type regulation [10] 
and Monod-Wyman-Changeaux-like (MWC) regulation 
|41) . and found that the MWC family encompasses a 
broader set of functions than the Hill family; for a re- 
lated discussion see Ref [12] • MWC functions also allow 
for a natural introduction of convergent control, where 
a node in a network is simultaneously regulated by sev- 
eral types of regulatory molecules. Briefly, in the MWC 
model one assumes that the molecule or supermolecular 
complex being considered has two states, which we iden- 
tify here with ON and OFF states of the promoter. The 
binding of each different regulatory factor is always inde- 
pendent, but the binding energies depend on whether the 
complex is in an OFF or ON state, so that (by detailed 
balance) binding shifts the equilibrium between these two 
states. 

In our case, we have two regulatory molecules, the in- 
put transcription factor with concentration c and the 
gene product with concentration g. If there are, re- 
spectively, n c and n g identical binding sites for these 
molecules then the probability of being in the ON state 
is 



f(c,g) 



(i + c/Q° n ) n ^i + g/QT)^ 



L(l + c/Qf) n ° (1 + g/Qf) n o + (1 + c/Q° n )»c (1 + g/Q° n ) n <> 



(37) 



where Q° n ,QS n are the binding constants in the ON state, and similarly Q° tt , Q° tt are the binding constants in 
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the OFF state; L reflects the "bare" free energy difference 
between the two states. If the binding of the regulatory 
molecules has a strong activating effect, then we expect 
Q° n <C Q° s , and similarly for Q g , which means that only 
one binding constant is relevant for each molecule, and 



or n g < 1. When either of these conditions are fulfilled, 
the gene is in the monostable regime. At the critical 
point, K* = (n g - lf/(An g ) and g* = (n g - l)/{2n ? ). 
This is illustrated in Fig [2] as a function of the effective 
input x( c ) = 6 — 7i c ln(l + c/K c ), where, from Eq p9|, 



we will refer to these as K c and K g . Then we can write 9 = n c ln(l + ci/ 2 /K c ) + n g ln(l + l/(2K g )). 



f(c,g) 
F(c,g) 



1 



1 + e - F ( c 'S) 

1 + c/K c 



(38) 



n r In 



1 + c 1/2 /K c 



1.1 + 1/(2*,) 



(39) 



where C\i% is the input concentration at which 5(01/2) = 
1/2. Notice that if binding of g strongly represses the 
gene, then we have Q° s -C Q° n , but this can be simu- 
lated by changing the sign of n g . Thus we should think 
of the parameters n c and n g as being not just the num- 
ber of binding sites, but also an index of activation vs. 
repression. We will also treat these parameters as con- 
tinuous, which is a bit counterintuitive but allows us to 
describe, approximately, situations in which the multiple 
binding sites arc inequivalent, or in which Q on and Q° s 
are not infinitely different. 

From the discussion in the previous section, we will 
need to evaluate the partial derivatives of /(c, g) with re- 
spect to it arguments. For the MWC model, these deriva- 
tives take simple forms: 



OC J^c + c 

dg 



(40) 
(41) 



D. Phase diagram 

Let us start by examining the stability properties of 
Eq (34), which determines the steady state g(c). Viewed 
as a function of g, /(c, <?) is sigmoidal, and so if we try 
to solve g = f(c,g) graphically we are looking for the 
intersection of a sigmoid with the diagonal, as a function 
of g. In doing this we expect that, for some values of the 
parameters, there will be exactly one solution, but that 
as we change parameters (or the input c), there will be a 
transition to multiple solutions. This transition happens 
when / just touches the diagonal, that is, when for some 
g* it holds true that f(g*, c) = g* and df(g, c)\ g */dg — 1. 



Using Eq (41 ), these two conditions can be combined to 



yield an equation for g* : 

ra-r) 



K g +9* 



= 1 



(42) 



This is a quadratic in g* for which no real solution on 
g* e [0, 1] exists if either 



K a > 



K-i) 2 

4n„ 



(43) 



For the special case of n g = 2 it is not hard to compute 
the analytical approximations for the boundary of the 
bistable domain. First, Eq (38) can be expanded for 
large K g , yielding a quadratic equation for g that has 
two solutions only when 



X < - \u4~2\nK g 



(44) 



To get the lower bound, we expand Eq (38) for small 



g and retain terms up to the quadratic order in g; the 
resulting quadratic equation yields two solutions only if 



X > HA/ K g - 3). 



(45) 



Both approximations are plotted as circles and crosses, 
respectively, in Fig |2j and match the exact curves well. 



For other values of n g we solve Eq ( 34 ) exactly, using a 



bisection method to get all solutions for a given c and 
we partition the range of c adaptively into a denser grid 
where the derivative g'(c) is large. For integer values of 
n g when the equation can be rewritten as a polynomial 
in g, it is technically easier to find the roots of the poly- 
nomial; alternatively one can solve for c given g using a 
simple bisection, because c(g) is an injective function. 



E. Information transmission near the critical point 

In this section we will generalize the computation of 
noise and information in the region close to the critical 
point, where the Gaussian noise approximation breaks 
down. We start by rewriting Eq's (24}- 25 1 in our normal- 
ized units, 



dg 

T — 

dt 

(mat')) 

T(g) 



f(c,g)-g + C 

2TT(g)S(t-t') 

df 

dg 



N„ 



g 



7n 



dc 



(46) 
(47) 

(48) 



This is equivalent to Brownian motion of the coordinate 
g in a potential V(g, c) defined by 



dV(g,c) 
dg 



f(c,g) 



(49) 



with an effective temperature T{g) that varies with posi- 
tion. If we simulate this Langevin equation, we will draw 
samples out of the distribution P(g\c), but we can con- 
struct this distribution directly by solving the equivalent 
diffusion or Fokker-Planck equation, 

^ing)P(9,t)}; 

(50) 



dP(g,t) d 



dt 



dg 



[V'(g,c)P(g,t)} 



o 
+ 



c 
I 




the steady-state solution is then P(g\c), and this is 
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FIG. 2: Monostable (green), critical (red) and bistable (blue) 
behavior of the self- activating gene for n g = 2. A) The phase 
diagram as a function of K g and the input-dependent term. 
In the region between the black solid lines two solutions gi : 2 
exist for every value of the input c (y-axis). The corresponding 
critical value is at K* — 1/8 (cusp of the black solid lines). 
Circles and crosses represent analytical approximations to the 
exact boundary of the bistable region for n g = 2 and large 
1/Kg (see text). For three choices of K g denoted by vertical 
dashed lines, the input / output relations g(c) are plotted in B. 
B). The critical solution (red) has an infinite total derivative 
g'{c) at 9 - n c log(l + c/K c ) = 3.29, g = 0.25. The bistable 
system (blue) has three solutions, two stable and one unstable, 
for a range of inputs that can be read out from the plot in A. 



P( 9 \c)=P(0)^-exp 



9 V{y,c) 
T(y) 



dy 



(51) 



The "small noise approximation" in this extended 
framework corresponds to expanding the integrand in 
Eq (51) around the mean, g = g(c) + Sg. If we write 
x = g — g (c), we will find 



P(g\c) oc exp {—a-ix 2 — 03a; 3 — (142; 4 — . . .) 



(52) 



where our previous approximations correspond to keep- 
ing only ai. The critical point is where a 2 = 0, and 
we have to keep higher order terms. In principle the 
expansion coefficients have contributions from the in- 
dependence of the effective temperature, but we have 
checked that these contributions are negligible near crit- 
icality. Then we have 



0-2 = 

«3 = 

d4 = 



l-f 



T 

3! T ' 
4! T ' 



(53) 



where primes denote derivatives with respect to g, and 
all terms should be evaluated at g = g{c). 



For the Monod-Wyman-Changeaux regulatory function in Eq ( 38 1 , all these derivatives can be evaluated explicitly: 
Of 



r 



0g 



fH= &f 
dg 2 



/(I-/) 



/(I-/) 



ng/K g 

1+9/ 'Kg 

ng/K g 
1 + 9 1 K s 



1-2/ 



1 



T dg* J, \l + g/K 9 



1 - 2/ - 



1 - 2/ - — - 2/(1 - /) 

n. 



(54) 
(55) 
(56) 



From Eq (42), the critical point occurs at g* = (n g — 
l)/2n g when K* — (n g — l) 2 /4n g , and at this point the 
derivatives simplify: 



/" - 

f" = 8 "' 



(58) 
(59) 



/' = 1 



(57) 



Now we want to explore behavior in the vicinity of 



9 



the critical point; we will fix K g to its critical value, 



K*(n g ), and compute the derivatives in Eqs ( 54|56 ) as 
df/dg — > 1. Consider therefore a small positive e such 
that 



dg 



(60) 



In a system with chosen K g and n g that yield critical 
behavior, the deviation from criticality above will happen 
at g = g* + A. To find the relation between e and A, we 



evaluate the derivative in Eq ( 54 1 at g to form a function 
ip(g) = g(l — g)n g /(K g + g), which evaluates to 1 at g* . 
This function can be expanded in Taylor series around 
g*; the first order in A vanishes and we find: 



n g (K g + l) 2 
(l + g*/K g r 



(61) 



Therefore, the derivative deviates by e from criticality 
at 1 when g deviates by ±A from the g* . We now per- 
form similar expansions on the second- and third-order 
derivatives in Eqs ( 55|56 l, and evaluate the factors at the 
critical point: 



9/ 
t)g 

dg 2 
dg 3 



An 2 

— i 



8n 



8n 2 g 



■A 



32ni 



(n|-l) a 



A 



(n; 
-4) 



(62) 
(63) 
(64) 



(n 2 



A 



These conditions guarantee that the higher order terms, 
which are evaluated around the critical point, are 
nowhere evaluated too far from the critical point such 
that the approximations would break down. 

We can now put the pieces together by using the gen- 
eral form of the optimal solution for P*(c) in Eq (10 1, 
together with the quartic ansatz for P{g\c) in Eq (52 1. 



For each c, we evaluate two entropies of the conditional 
distribution P(g\c): 



S 2 [P(g\c)} = log 2 ^2iteo 2 g {c) (66) 
S 4 [P(g\c)} = - f dgP(g\c)log 2 P(g\c). (67) 



S4 is the noise entropy with higher-order terms included 
whenever conditions Eq (65) are met, and S 2 is the noise 



entropy in the Gaussian approximation. 

Equation ( 10 ) can be rewritten in a numerically stable 
fashion by realizing that P*(c)\dg/dc\~ 1 — P*(g), that 
is, that the optimal distribution of mean output levels is 
given by 

S[P(g\g)] t z _ 



P*(9) 



(68) 



To join the Gaussian and higher-order approximations 
consistently in the regimes away and near the critical 
point, the noise entropy in Eq ( [69] ) is chosen to be the 
pointwisc minimum of Si(g) and S 2 (g). Finally, the in- 
formation is again / = log 2 Z , with 



Z = 



a(C) 



dg2 



S[P(g\5)] 



(69) 



F. Information transmission in the bistable regime 



These expressions have been evaluated for K* g , but we 
could have easily repeated the calculation by assuming 
that K g itself can deviate a bit from the critical value, 
6 K gi which would yield somewhat more 



i.e. 



K — K* 

9 ~ 



complicated results that we don't reproduce here. 



Equations ( 62 



64 1 can be used in Eq ( 53 1 to write down 



the probability distribution P(g\c). Far away from the 
critical point the Gaussian approximation is assumed to 
hold, and 03, 04 can be set to 0. Close to the critical point 
the higher order terms <!■$ and 04 need to be included. To 
assess the range where this switchover occurs, we com- 
pare in Eqs (62 64 1 the leading to the subdominant cor- 



rection: we insist that the quadratic correction in Eq ( 63 ) 



is always smaller than linear, and that the linear correc- 
tion in Eq ( 64 1 is always smaller than constant (we drop 
the quadratic correction there). We found empirically 
that including the higher-order corrections yields good 
results when the following conditions are simultaneously 
satisfied: 



|A| < 



1 



16n„ 



(65) 



|A| < 0.25. 



We now discuss the information capacity in the 
bistsable regime, away from the critical line. In this 
regime, each value of the input c can give rise to mul- 
tiple solutions of the steady state equation, Eq (34). In 



the simplest case (which includes the MWC regulatory 
functions), there will be two stable solutions, g~i(c) and 
52(c), and a third solution, 93(c), that is unstable. In 
equilibrium, the system will be on the first branch with 
weight wi(c) and on the second with weight w 2 (c). Here 
we place upper bound on the information /(c; g), again in 
the small noise approximation. This will be useful since, 
as we will see, even this upper bound is always less than 
the information which can be transmitted in the monos- 
table or critical regime, and so we will be able to conclude 
that the optimal parameters for which we are searching 
are never in the bistable regime. 

In the bistable regime, the small noise approximation 
(again, away from the critical line) means that the condi- 
tional distributions are well approximated by a mixture 
of Gaussians, 



P(g\c) = wi{c) 



V 2 ™ 2 (c 



(a-si(c)r 

2t?(c) 
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+W2(c) 



V 2 ™ 2 2 (c) 



(70) 



To compute the information we need two terms, the total 
entropy and the conditional entropy. The conditional en- 
tropy takes a simple form if we assume the noise is small 
enough that the Gaussians don't overlap. Then a direct 
calculation shows that, as one might expect intuitively, 
the conditional entropy is just the weighted sum of the 
entropies of the Gaussian distributions, plus a term that 
reflects the uncertainty about which branch the system 



is on, 



1 2 

S[P(9\c)] = -X>i(c)log 2 [27reo?(c)] 



i=l 



^ Wi (c)log 2Wi (c). (71) 



i=i 



Implementing the small noise approximation for the 
total entropy is a bit more subtle. We have, as usual, 



P{g) = / dcP{c)P{g\c) 



s 



(S-Si(c)) 2 



2<7?( C ) 



(9 -52(c)) 2 



(72) 



If the noise is small, each of the two integrals is dominated 
by values of c near the solution of the equation g = g\{c); 



let's call these solutions Ci(g). Notice that these solutions 
might not exist over the full range of g, depending on the 
structure of the branches. Nonetheless we can write 



P(9) 



Wl (c)P{c) 



dgi (c) 



+ 



dc 

w 2 {c)P(c) 



dg 2 (c) 



c=ci(g) 
1" 



dc 



, (73) 



c=&2(g) 



with the convention that if we try to evaluate w;(c) at 
a non-existent value of c\, we get zero. Thus, the full 
distribution P(g) is also a mixture, 



P{g) = fiPi(g) + hP2{g). 



(74) 



The fractional contributions of the two distributions are 
fi = JdcP(c) Wi (c), (75) 

where J. dc ■ ■ ■ denotes an integral over the regions along 
the c axis where the function Ci(g) exists, and the (nor- 
malized) component distributions are 



Pi(9) = 



1 



dg\{c) 



dc 



(76) 



Wi(c)P(c) 

c=Ci(s) 

The entropy of a mixture is always less than the average 
entropy of the components, so we have an upper bound 



S[P(g)} < -Y,hf dgP^logzPig) 

i=l J 



i=l 
2 



Wi(c)P( C ) ■ 

= — ^ / dcP(c)wi(c) log : 



log 2 



c=ci(ff) 



fi 



Wi(c)P(c) 



dgi(c) 


-l" 


dc 





c=ci(g) 



jWi(c)P(c) 



dgi(c) 



dc 



An upper bound on the total entropy is useful because it allows us to bound the mutual information: 
I(c-g) = S[P(g)\- J dcP(c)S[P(g\c)\ 

2 r 

< -J2 dcP(c) Wi (c)log 2 

;_1 Ji 



- Wi (c)P(c) 

Ji 



dg\{c) 



dc 



dcP{c) Wi {c)\og 2 [2neof{cj\ + ^ J dcP(c)wi(c) log 2 Wi(c) 



(77) 
(78) 
(79) 



(80) 



(81) 
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dcP(c) log 2 P(c) + - dcP(c) ^ Mc) log: 



< - / dcP{c)\og 2 P(c) + - / dcP(c)^2wi(c)lo & 



i=i 

2 



i=l 




r 



where in the last step we use the positivity of the entropy 
associated with the mixture weights {/i}. 

We can now ask for the probability distribution P(c) 
that maximizes the upper bound on J(c;<?), and in this 
way we can bound the capacity of the system. Happily, 
the way in which the bound depends on P(c), in Eq (83 ), 



is not so different from the dependencies that we have 
seen in the monostable case [Eq Q], so we can follow a 
parallel calculation to show that 



l(g;c) < log 



dce'^ 



(c) = J2wi(c)\n 



dgi(c) 



dc 



(84) 
.(85) 



Finally, to find the weights u>;(c) we can numerically in- 
tegrate the Fokker-Planck solution in Eq (51 1 to find 



wi(c) 



w 2 {c) 



93(c) 



33(c) 



dg' P(g'\c), 

dg' P(g'\c). 



(86) 



(87) 



To summarize, we have derived an upper bound on 
the information transmitted between the input and the 
output. The tightness of the bound is related to the 
applicability of the "no overlap" approximation, which 
for MWC-like regulatory functions should hold very well, 
as we have verified numerically. If only one of the weights 
wi =/= 0, our results reduce to those in the monostable 
case, as they should. 



III. RESULTS 

We begin by showing that the analytical calculations 
presented in the previous section can be carried out nu- 
merically in a stable fashion, both away from and in the 
critical regime. We recall that the inform atio n transmis- 
sion is determined by an integral Z [Eq ( 36 )] , and that 



because we are working in the small noise approximation 
we have a choice of evaluating this as an integral over the 
input concentration c or an integral over the mean out- 
put concentrations g. Figure [3] shows the behavior of the 
integrands in these two equivalent formulations when we 
have chosen parameters that are close to the critical point 
in a self-activating gene. The key result is that, once 
we include terms beyond the Gaussian approximation to 



P(g\c) following the discussion in Section HE we really 
do have control over the calculation, and find smooth re- 
sults as the parameter values approach criticality. Thus, 
we can compute confidently, and search for parameters 
of the regulatory function 9 = {ci/ 2l K c , n c , K g , n g } that 
maximize information transmission. 

We start the optimization by choosing the parameter 
values 8 g = {n g ,K g } which describe the self-interaction 
term, and then holding these fixed while we optimize the 
remaining ones, 9 C = {c\/ 2l n c , K c }. In all these opti- 
mizations the parameter K c is driven to zero, and in this 
limit the MWC regulatory function of Eq ( 38 1 simplifies 
to something more like the Hill function, 



f(g,c) 
Hg) 



n g In 



1+9/ Kg 

1 + i/(2jg 



(89) 



Once we have optimized # c , we can explore the informa- 
tion capacity as a function of 9 g , at varying values of the 
remaining parameter in the problem, the maximal con- 
centration C of transcription factors. Figure [4pV maps 
out the "capacity planes," I*(n g ,K g ;C) at fixed C. In 
detail, we show I*(n g , K g -C)~ ^max(C) f° r three choices 
of our parameter C, where Ima X (C) is the information 
obtained with the optimal choice of n g and K g ; the best 
choice of parameters is depicted as a yellow circle in the 
capacity plane. 

For large values of C, C = 1, 10, the optimal solu- 
tion is at n g — > or K g — > 00 (magenta square in the 
lower right corner), which drives the self-activation term 



in Eq (38) to zero, towards a noninteracting solution. We 
have checked that these solutions correspond to optimal 
solutions for a single noninteracting gene found in our 
previous work [7]. As C is decreased, however, the opti- 
mal combination {n gi K g } shifts towards the left in the 
capacity plane (cyan square for C = 0.1), exhibiting a 
shallow but distinct maximum in information transmis- 
sion. If wc examine the mean input/output relations in 
Fig [4j3, we find nothing dramatic: the critical (red) so- 
lutions seem to have lower capacities (which we carefully 
reexamine blow), while other quite distinct parameter 
choices for {n g ,K g } nevertheless generate very similar 
mean input /output relations, because of the freedom to 
optimally choose 9 C parameters. The behavior of effec- 
tive noise in the input, cr c (c), given by Eq ( [l4| and shown 
in Fig[4jC, is more informative; recall that J dc cr~ 1 (c) 
is proportional to the information transmission. Non- 
interacting (magenta) solutions always have the lowest 



12 



A 


1 




0.8 




0.6 












0.4 




0.2 








0.5 

c 





FIG. 3: Computing information transmission close to the crit- 
ical point. A) An input/output relation g(c) for n g — 2, 
C — 1 and Kg = K g (n g ) + 0.004, showing a self-activating 
gene with an almost critical value of K g . B) Noise in the in- 
put from Eq ( 14 1, which is the integrand in the expression for 



information in Eq (361, shows an incipient divergence between 
red vertical bars. Inset: zoom-in of the peak shows that it can 
be sampled well by increasing the number of bins. Different 
plot symbols indicate domain discretization into 500 (black 
dots) and 5000 (red circles) bins. At the critical point the 
divergence is hard to control numerically. C) An alternative 
way of computing the same information, by integrating in the 
output domain as in Eq ( |69| |. Shown is the integrand in the 
gaussian noise approximation [black, Si (g) from Eq ( 66 1] and 
with quartic corrections [red, 84(g) from Eq |67l]. At the crit- 



ical point higher order corrections regularize the integrand, 
while away from the critical point the integrand smoothly 
joins with the gaussian approximation. This approach is sta- 
ble numerically both away from and in the critical regime. 
D) Information / = log 2 Z as a function of K g for n g = 2. 
Critical K* = 1/8 is denoted by a dashed red line. Integra- 
tion across g in the output domain with quartic corrections 
(squares) agrees well with the integration across c in the input 
domain (crosses) away from K g , but also smoothly extends 
to the critical K g . This is a cut across the capacity plane in 
FigflK (denoted by a dashed yellow line) for C = 1. 



amount of noise at high input concentrations (c ~ C). As 
the self-interaction turns on, the noise at high input in- 
creases, but that increase can be traded off for a decrease 
in noise at medium and low c. While for low C = 0.1 
the critical (red) solution is never optimal, the solution 
with some self-activation manages to deliver an addi- 
tional ~ 0.2 bits of information. We have verified that 
for lOx smaller value of C = 0.01 the capacity plane is 
qualitatively the same, exhibiting the peak at a nontriv- 



Intuitively, the self-activation parameters 9 g have three 
direct effects on the information transmission: they 
change the shape of the input/output curve, the self- 
activation feeds some of the output noise back into the 
input, and the time r (protein lifetime) that averages 
the input noise component gets renormalized to r — > 
t(1 — df I 'dg)~ x . The changes in the mean input/output 
relation can be partially compensated for by the corre- 
lated changes in the # c , as we observed in our optimal 
solutions, suggesting that regardless of the underlying 
microscopic parameters, it is the shape of g(c) itself that 
must be optimal. The increase in averaging time acts to 
increase the information, thus favoring self-activation. 
However, this will simultaneously increase the noise in 
the output that feeds back, as well as drive g(c) towards 
infinite steepness at criticality, restricting the dynamic 
range of the output. At low C there is a parameter regime 
where increasing the integration time will help decrease 
the (dominant) input noise enough to result in a net gain 
of information. At high C, input noise is minimal and 
thus this averaging effect loses its advantage; instead, 
feedback simply acts to increase the total noise by rein- 
jecting the output noise at the input, so that optimiz- 
ing information transmission drives the self-interaction 
to zero. 

Next we examine in detail the behavior of information 
transmission close to the critical region. Close to, but not 
at, the critical point we perform very fine discretization 



of the input range to evaluate the integral in Eq ( 36 ) , as 
reported in Fig [3j3. To validate that the information in- 
deed reaches a maximum at nontrivial values of {n g , K g } 
when C = 0.1, we cut through the capacity plane in 
Fig|4K along the yellow line at rig = 2, and display the 
resulting capacity values in Fig pK (the results are nu- 
merically stable when integrated on 10 4 or 10 3 points). 
Unlike for C = 1 and C = 10, for C = 0.1 the maximum 
is clearly achieved for a nontrivial value of K g , but away 
from the critical line, confirming our previous observa- 
tions. We further examine the capacity directly on the 
critical line, K* = (n g — l) 2 /(4n s ), as a function of n g 
at C = 1 (denoted in Fig[4j\ with dashed red line). The 
capacity in this case can be calculated using Eq ( 69 1 and 
is shown in Fig[5}3. The capacity that includes quartic 
corrections is higher by ~ 0.05 — 0.1 bits than in the gaus- 
sian approximation, making the effect small but notice- 
able. We also confirmed that the capacity at the critical 
line joints smoothly with the capacity near the line, i.e. 
that there is no jump in capacity exactly at criticality, 
which presumably would be a sign of numerical errors. 
Figure [5p finally validates that across the whole range 



of n g for C 



1, small increases in K g above the critical 



ial (but still not critical) choice of {n g 
(not shown). 



0.51,^ « 0.11} 



value K*(n g ) always lead to an increase of information, 
demonstrating that the maximum is not achieved on the 
critical line. 

We next turn to the joint optimization of all parame- 
ters and plot the information transmission as a function 
of C in Fig [6] As we have discussed, optimization drives 
the strength of self-activation to zero for C > 1 (but 
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FIG. 4: Information transmission as a function of self-activating parameters {n g ,K g } for three values of C; C = 0.1 (left col- 
umn), (7 = 1 (middle column), C = 10 (right column). For each value of {n g , K g } the three remaining parameters {c\/2,n c , K c } 
are optimized to maximize information. A) The capacity planes showing the decrease in capacity in bits (colormap) from the 
maximal value achieved for an optimal choice of all parameters; the optimal set of parameters is denoted by a colored square 
on a yellow circle. The white upper left region of each capacity plane corresponds to the bistable region. For low C = 0.1 the 
maximum is achieved in the interior of the domain (cyan square), while for C — 1, 10 n g is driven to (magenta squares), 
corresponding to the non-self-interacting solution. Red square represents an example critical system at n g — 2 and the green 
square represents a self-interacting system with a high value for n g . The yellow dashed line in C — 1 plane represents a cut 
shown in detail in Fig[3p. The yellow solid line in C = 0.1 plane represents a cut shown in detail in Fig[5}\. The red dashed 
boundary of the critical region in C = 1 plane is analyzed in detail in Figs [5j3, |5p. B) Input/output relations for 4 example 
systems denoted by colored squares in A. Despite substantially different values for {n g ,K g }, the optimization of remaining 
parameters makes the input/output curves look very similar to the optimal solutions, except for the critical (red) curves. C) 
The effective input noise o c {c) for the selected systems. 



see below for self-repression), and at these high values 
of C the result of full optimization coincides with the 
non-interacting case. As C falls below one, the gain in 
information due to self-activation is increased, reaching a 
significant value of about a bit for C = 0.01. As we have 
noted in Section |II C[ the self-activating effect of g on 
its own expression can be changed into a self-repressing 
effect by simply flipping the sign of the parameter n g . To 
explore the optimization of such self-repressing genes, we 
thus optimized the parameters as before, now constrain- 
ing n g < 0. Results in {n g ,K g } plane are shown in Fig 
[7j for C = 1 and C = 10. 

We find that, for large C, the optimization process 



drives both K c and K g toward zero, so that the effective 
input/output relation is given by 

/( ^ = ^ + c C(%r- ' (90) 

with nonzero values of n g being optimal. Why is self- 
repression optimal at large C, when self-activation is 
not? Self-repression suppresses noise at high concentra- 
tions of the input (red vs magenta curves Fig [7p for 
C = 10) and allows the mean input/output curve to be 
more linear than in the non-interacting case (Fig[7|3), ex- 
tending the dynamic range of the response. Both these 
effects serve to increase information transmission. 
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FIG. 5: Information transmission close to the critical line. A) A detailed scan of information for C = 0.1 as the function 
of Kg for n g — 2 and optimal choice for 9 C for every K g exhibits a peak for a nontrivial value of K g . B) The information 
transmission along the critical line for C = 1 as a function of n g indicates that while including quartic corrections is important, 
the information at the critical line does not exhibit any large jumps (not shown). C) A detailed scan of capacity close to the 
critical line for C — 1 as a function of n g (vertical axis). Shown in color is the increase in information in bits compared to the 
value very close to the critical I*(n g ), as a function of the distance in K g from the critical value (horizontal axis). Across the 
range of n g , going from the critical axis into the monostable domain increases the information. 



It is remarkable that when we put together the self- 
activating and self-repressing solutions, we see that they 
join smoothly at C — 1 (Fig[6|: self-activation is optimal 
for C < 1 , and self-repression is optimal for C > 1 , while 
precisely at C = 1 the system that transmits the most 
information is non-interacting. 

All of this discussion has been in the limit where the 
maximal concentration of output molecules, 7 max , is the 
same as the maximal concentration of input molecules, 
C, so there is only one parameter that governs the struc- 
ture of the optimal solution. This makes sense, at least 
approximately, since both input and output molecules are 
transcription factors, presumably with similar costs, but 
nonetheless we would like to see what happens when we 
relax this assumption. Intuitively, if we let 7 max become 
large, the system can achieve the advantages of feedback 
while the impact of noise being fed back into the system 
should be reduced. 



If we look at Eq (36 1 for Z, which controls the infor- 

■> oo to find 



mation capacity, we can take the limit 7„ 



Z = 



dc 



9+ Tfc 



(91) 



Now the only place where feedback plays an explicit role 

_ i 

is in the term (l — ^ J , which comes from the length- 



ening of the integration time, which in turn serves to av- 
erage out the noise in the system. All other things being 
equal (which may be hard to arrange), this suggests that 
information transmission will be maximized if the system 
approaches the critical point, where df /dg — > 1. The dif- 
ficulty is that the system can't stay at the critical point 
for all values of the input c, so there must be a tradeoff 



between lengthening the integration time and using the 
full dynamic range. 

To explore more quantitatively, we treat 7 max /C as 
a parameter. When C is small, we know that self- 
activation is important, and in this regime we see from 
Fig[8]that changing 7 max /C matters. On the other hand, 
for large values of C we know that (at 7 max /C = 1) 
optimization drives self-activation to zero, so we expect 
that there is less or no impact of allowing 7 max /C ^ 
1. We also see that, for a fixed small C, increasing 
7m ax /C drives the system closer towards the signatures 
of criticality — nonmonotonic behavior in the noise and a 
steepening of the input/output relation. In more detail, 
we can plot the value of max c (df/dg) as a function of 
C and 7 max /C, that is, check for each of the solutions in 
Fig|8j\ how close the partial derivative df /dg comes to 1, 
which is a direct measure of criticality. We confirm that, 
for the simultaneous choice of small C and large "f max /C, 
we indeed have df /dg — > 1. In the extreme, if we choose 
C = 0.01 and 7 max /C = 10 4 , we find that the optimal K c 
and Kg are driven towards small values (but since c, C 
are small K c is not negligible); the optimal n g ps 1.0662. 
With this value of n g , the corresponding critical value for 
K g would be K*(n g ) = 0.001, and the numerically found 
optimal value in our system is K g = 0.0012. The critical 
value for g* would be g*{n g ) = 0.031, and indeed at this 
small value the optimal mean input /output relation has 
a strong kink, the effective noise a c has a sharp dip and 
df /d g at this point climbs to 0.9936. Numerically, there- 
fore, we have all the expected indications of emerging 
criticality at very large 7 max . For less extreme values, we 
expect the optimum to result from the interplay between 
the input and transmitted noise contributions, which in 
general need not be on the critical line. 

To complete our exploration of the optimization prob- 
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lem, we have to consider parameter values for which the 
output has two locally stable values given a single input. 
Quantitatively, in the bistable regime we have to solve 
for both stable solutions <?i(c), with i = 1,2, and for the 
unstable branch (73(c). We can then evaluate the equilib- 
rium probabilities Wj(c ) of being on either of the stable 
branches using Eq ( |87| ) , and use Eq (851 to compute the 
capacity. As shown in an example in Fig [9]4., we never 
find the optimal solutions in the bistable region — the ca- 
pacity starts decreasing after crossing the critical line. 
Consistently with our argument that output and feed- 
back noise must become negligible for the regime of small 
C and large 7 max /C, we find that optimization drives the 
system towards achieving maximal transmission closer 
and closer the the critical line (which is approached from 
the monostable side), as shown in Fig[9h. 



IV. DISCUSSION 

To summarize, we have analyzed in detail a single, self- 
interacting genetic regulatory element. As in previous 
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FIG. 6: Information transmission in a system where all pa- 
rameters 8 are optimized as a function of the maximal input 
concentration C. The self-interacting system (red) allows for 
an arbitrary MWC-like regulatory function [Eq (38 1] with pa- 
rameters — {ci/2, Tic, K c , rig, K g }. The noninteracting sys- 
tem (black) only has the MWC parameters n c ,K c and the 
leak L (see Ref [7]) which can be reexpressed in terms of 
Ci/2. Bright red line with circles shows self-activating solu- 
tions which are optimal for C < 1, while dark red line with 
crosses shows self-repressing solutions, optimal for C > 1. 
Plotted on the secondary vertical axis in green is the ratio 
between the self-interacting contribution to F, and the input 
contribution to F in the expression for the MWC regulatory 
function [Eq ( |38[ )]. For C ~ 1 where the interacting and non- 
interacting solution join, this term falls to 0, as expected. 
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FIG. 7: Information transmission as a function of self- 
repressing parameters {n g ,K g } for C — 1 (left column) and 
C = 10 (right column); plot conventions are the same as in 
Fig [4] A) The capacity decrease from the maximum value 
(achieved at the parameter choice indicated by a yellow cir- 
cle) as a function of {n g ,K g }. The maximum information 
transmission is achieved for a non-interacting case (magenta) , 
for (7=1. In contrast, for C = 10, there is a non-trivial op- 
timum for small values of K g and n g ~ —2.5 (red). B) The 
mean input/output solutions for three example systems from 
A (red, magenta, cyan). C) The effective noise in the input, 
<7 c (c), for example solutions in A. 



work, we based our analysis on three assumptions: (i) 
that the readout of the information I(c; g) between the 
input and output happens in steady state, (ii) that noise 
is small; and (iii) that the constraint limiting the infor- 
mation flow is the finite number of signaling molecules. 
In addressing a system with feedback, assumption (ii) 
requires technical elaboration near the critical point, as 
discussed above. But (i) requires a qualitatively new dis- 
cussion for systems with feedback, because of the possi- 
bility of multistability. 

Our analysis, with the steady state assumption, shows 
that truly bistable systems do not maximize the infor- 
mation. Intuitively, this stems from the branch ambigu- 
ity: for a given input concentration c a bistable system 
can sit on either one of the stable branches with some 
probability, and this uncertainty contributes to the noise 
entropy, thereby reducing the transmitted information. 
But reaching steady state involves waiting for two very 
different processes. First, the system reaches a steady 
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FIG. 8: The dependence of information transmission in the 
self- activating case on the ratio 7max/C'. A) For various 
choices of C indicated on the plot, the information transmis- 
sion with the optimal choice with respect to all parameters 
is shown as a function of 7max/C*. Two special systems of 
interest (blue, green) are chosen for the lowest value of C. B) 
The mean input/output relation for the blue and green sys- 
tem. The green system has a higher transmission, a steeper 
activation curve but a smaller dynamic range. C) The effec- 
tive noise in the input, er c (c), for the blue and green systems. 
The green system is closer to critical at the point where the 
mean input/output curve has the highest curvature and the 
noise exhibits a dip. 



distribution of fluctuations in the neighborhood of each 
stable point, and then the populations of the two stable 
states equilibrate with one another. As with Brownian 
motion in a double well potential (or a chemical reac- 
tion), these two processes can have time scales that are 
separated by an exponentially large factor. 

Alternatively, the timescales of real regulatory and 
readout processes could be such that the system does not 
have the time to equilibrate between the stable branches. 
In that case, the history (initial condition) of the system 
will matter, and the final value of the output g will be de- 
termined jointly by the input c and the past state of the 
output, go. Such regulatory elements can be very useful, 
because they retain memory of the past and are able to 
integrate it with the new input; a much studied biologi- 
cal example is that of a toggle switch. The information 
measure we use here, I(c;g), will not properly capture 
the abilities of such elements, unless we modify it to in- 
clude the past state, e.g. into I({c, go};g)' here both the 
input and current state together determine the output. 
Such computations are beyond the scope of this paper, 
but could make precise our intuitions about switches with 
memory. 

Multistability also allows for qualitatively new effects 
at higher noise levels. In our previous work we found that 
full information flow optimization (without assuming 
small noise) leads to higher capacities than a small-noise 
calculation for an identical system and, moreover, that 
as noise grows, the optimal solutions start resembling a 
(noisy) binary switch where only the minimum and max- 



imum states of input are populated in the optimal P* n (c) 
[4] . At high noise, positive (even bistable) autoregulation 
could stabilize these two states and make them more dis- 
tinguishable. In this case the design constraint for the 
genetic circuit is to use the smallest number of molecules 
that will prevent spontaneous flipping between the two 
branches on the relevant biological timescales [T5] . In 
this limit regulatory elements can operate at high noise, 
with perhaps as few as tens of signaling molecules. 

With these caveats in mind, our main results can be 
summarized as follows. Except at C ~ 1, the possibil- 
ity of self-interaction always increases the capacity of 
genetic regulatory elements. For C < 1, the optimal 
strategy is self-activation, while for C > 1 it is self- 
repression, as shown in Fig [6j Self-repression allows 
the system to reduce the effective noise at high input 
levels and straighten the input /output relation, packing 
more "distinguishable" signaling levels into a fixed input 
range. Self-activation for small C lengthens the effective 
integration time over which the (dominant) input noise 
contribution is averaged, thereby increasing information. 
The optimal level of self-activation is never so strong 
as to cause bistability, but does, for small C and large 
7max/C, push the optimal system towards the critical 
state. 

An interesting observation about the nature of 
the optimal solutions is that self-activation which is 
strong enough to enhance information transmission may 
nonetheless not result in a functional input/output re- 
lation that looks very different from a system without 
self-activation, albeit with different parameters. In such 
cases, information transmission is enhanced primarily by 
the longer integration time and reduced effective noise 





FIG. 9: Crossing the critical line (dashed red) into the bistable 
region. A) Capacity as a function of n g at fixed K g = 0.05, 
with the optimal choice of {n c , K c , C1/2} parameters, for three 
values of C (C = 0.1,1,10 dark to bright red, respectively). 
Dots show capacity calculation using the bistable code that 
can handle multiple branches using Eq ( |85| l, solid line uses 
the monostable integration as in Eq ( 36 1 . B) Optimal capac- 
ity at very high ratios 7 max /C = 10" for different value of C 
(C = 10~ 3 , 10~ 2 , 10 _1 , 1: circles, crosses, squares, stars, re- 
spectively). The optimum is pushed towards the critical line 
from the monostable side for 7 max /C large and C small. In 
all cases, information in the bistable regime is smaller than in 
the monostable regime. 
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level. This means that there need be no dramatic sig- 
nature of self-activation, so that diagnosing this oper- 
ating regime requires a detailed quantitative analysis. 
More generally, this result emphasizes that the same phe- 
nomenology can result from different parameter values, 
or even networks with different topology — in this case, 
with and without feedback. 

Stepping back from the detailed results, our goal in 
this paper was to make progress on understanding the 
optimization of information flow in systems with feed- 
back by studying the simplest example. The hope is that 
our results provide one building block for a theory of 
real genetic networks, on the hypothesis that they have 
been selected for maximizing information transmission. 
As discussed in previous work [31 [SJ [7] , a natural target 
for such analysis is the well studied gap gene network in 
the early Drosophila embryo [43] . But we can also hope to 
connect with a broader range of examples. The prevail- 
ing view of self-activation has been that its utility stems 
from the possibility of creating a toggle (or a flip-flop) 
switch. This explanation, however, can only be true if 
self-activation is strong enough to actually push the sys- 
tem into the bistable regime. De Ronde and colleagues 
|36j have improved on this intuition and have shown, in 
the linear response limit, that weak self-activation will 



increase the signal to noise ratio for dynamic signals, a 
function very different from the switch. Here we show 
that in the fully nonlinear, but steady state treatment, 
monostablc self-activation can be advantageous for infor- 
mation transmission. Furthermore, we show that there is 
a single control parameter, the ratio C between the out- 
put and input noise strengths, which determines whether 
self-activation or self-repression is optimal. Since more 
and more quantitative expression data is available, espe- 
cially for bacteria and yeast, one could try assessing how 
the use of both motifs correlates with the concentrations 
of input and output signaling molecules. 
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